The code style that we chose was BigCamelCase

1 Topic

For our project, we are using two NFL Data sets. We chose to have our project be about NFL data because we are both sports fans and specifically big NFL fans and we are both interested in the overall breakdown of statistics in football.

2 Investigation

2.1 For the defense data, we are investigating:

How each statistic varies by position?

What positions lead in what stats?

2.2 For the Quarterback data, we are investigating:

How do Quarterbacks stats compare between eras?

What Quarterback had the best season?

3 Defense Data Set

This data set is from Pro Football Reference. The purpose of this data set is that it contains the defensive statistics for all players who recorded at least 1 defensive stat in the 2023/24 NFL Season. A case in this data set is a individual player. Our analysis of the defense data set will focus on most of the attributes. Those attributes include: Position, G, GS, Int, TD, PD, FF, FR, Sk, Comb, Solo, Ast, TFL, QBHits, and Sfty.

DefenseStats <- read_excel("2023 NFL Defense Stats.xlsx") 

4 Defense Data Set Wrangling

Selected the columns from the data set we were interested in using excluding columns like player name, team, and return yards as those were not needed. Filtered the position column to only include defensive positions and not include offensive positions. Grouped all the data by position so the table would have one row for each position and the entries in each column would be the total stats for all players in that position.

DefS <- DefenseStats %>%
  select(Position,G,GS,Int,TD,PD,FF,FR,Sk,Comb,Solo,Ast,TFL,QBHits, Sfty)
DefS <- DefS %>% filter (Position %in% c("DT", "DE", "LB", "CB", "S"))
DefS <- DefS %>% group_by(Position) %>%
  summarize(
    Games = sum(G),
    Games_Started = sum(GS),
    Interceptions = sum(Int),
    Touchdowns = sum(TD),
    Pass_Deflections = sum(PD),
    Forced_Fumbles = sum(FF),
    Fumble_Recoveries = sum(FR),
    Sacks = sum(Sk),
    Combined_Tackles = sum(Comb),
    Solo_Tackles = sum(Solo),
    Assisted_Tackles = sum(Ast),
    Tackles_For_Loss = sum(TFL),
    QB_Hits = sum(QBHits),
    Safeties = sum(Sfty), .groups = "drop"

  )

5 Defense Data Table

DefS %>%
  kable(
    caption = "Statistic Totals for Each NFL Position",  
    booktabs = TRUE,                        
    align = rep("c", 11)           
  ) %>%
  kable_styling(      
    bootstrap_options = c("striped"),     
    font_size = 16
  )
Statistic Totals for Each NFL Position
Position Games Games_Started Interceptions Touchdowns Pass_Deflections Forced_Fumbles Fumble_Recoveries Sacks Combined_Tackles Solo_Tackles Assisted_Tackles Tackles_For_Loss QB_Hits Safeties
CB 3086 1482 190 26 1163 89 67 43.0 8270 6213 2057 297 100 1
DE 1211 515 2 1 104 50 25 329.5 2399 1420 979 429 730 1
DT 2508 1275 10 6 208 57 59 392.5 5047 2646 2401 641 1046 3
LB 3772 1670 78 18 461 153 106 567.0 12336 7523 4813 1056 1166 2
S 1810 1034 149 12 466 70 50 63.0 6719 4569 2150 225 128 0

For this data table, each row is a different position. Each column is a different stat and the entries are the total of that recorded stat for all players of that position.

6 Defense Data Visualizations

For the Defense Data, we created 14 data visualizations. Each data visualization has position as the x-axis and the stat as the y-axis. Each data visualization shows which position leads in that statistic.

6.1 Games Graph

ggplot(DefS) +
  aes(x = Position, y = Games) +
  geom_col(fill = "#212221") +
  labs(
    x = "Position",
    y = "Games",
    title = "Total Games by Position for the 2023/2024 NFL Season"
  ) +
  theme_light() +
  theme(
    plot.title = element_text(size = 15L,
                              face = "bold"),
    axis.title.y = element_text(face = "bold"),
    axis.title.x = element_text(face = "bold")
  )

This graph shows that linebackers lead in games played and defensive ends have the least games played.

6.2 Games Started Graph

ggplot(DefS) +
  aes(x = Position, y = Games_Started) +
  geom_col(fill = "#212221") +
  labs(
    x = "Position",
    y = "Games Started",
    title = "Total Games Started by Position for the 2023/2024 NFL Season"
  ) +
  theme_light() +
  theme(
    plot.title = element_text(size = 15L,
                              face = "bold"),
    axis.title.y = element_text(face = "bold"),
    axis.title.x = element_text(face = "bold")
  )

This graph shows that linebackers lead in games started and defensive ends have the least games started.

6.3 Interceptions Graph

ggplot(DefS) +
  aes(x = Position, y = Interceptions) +
  geom_col(fill = "#212221") +
  labs(
    x = "Position",
    y = "Interceptions",
    title = "Total Interceptions by Position for the 2023/2024 NFL Season"
  ) +
  theme_light() +
  theme(
    plot.title = element_text(size = 15L,
                              face = "bold"),
    axis.title.y = element_text(face = "bold"),
    axis.title.x = element_text(face = "bold")
  )

This graph shows that corner backs lead in interceptions and defensive ends have the least interceptions.

6.4 Touchdowns Graph

ggplot(DefS) +
  aes(x = Position, y = Touchdowns) +
  geom_col(fill = "#212221") +
  labs(
    x = "Position",
    y = "Touchdowns",
    title = "Total Touchdowns by Position for the 2023/2024 NFL Season"
  ) +
  theme_light() +
  theme(
    plot.title = element_text(size = 15L,
                              face = "bold"),
    axis.title.y = element_text(face = "bold"),
    axis.title.x = element_text(face = "bold")
  )

This graph shows that corner backs lead in touchdowns and defensive ends have the least touchdowns.

6.5 Pass Deflections Graph

ggplot(DefS) +
  aes(x = Position, y = Pass_Deflections) +
  geom_col(fill = "#212221") +
  labs(
    x = "Position",
    y = "Pass Deflections",
    title = "Total Pass Deflections by Position for the 2023/2024 NFL Season"
  ) +
  theme_light() +
  theme(
    plot.title = element_text(size = 15L,
                              face = "bold"),
    axis.title.y = element_text(face = "bold"),
    axis.title.x = element_text(face = "bold")
  )

This graph shows that corner backs lead in pass deflections and defensive ends have the least pass deflections.

6.6 Forced Fumbles Graph

ggplot(DefS) +
  aes(x = Position, y = Forced_Fumbles) +
  geom_col(fill = "#212221") +
  labs(
    x = "Position",
    y = "Forced Fumbles",
    title = "Total Forced Fumbles by Position for the 2023/2024 NFL Season"
  ) +
  theme_light() +
  theme(
    plot.title = element_text(size = 15L,
                              face = "bold"),
    axis.title.y = element_text(face = "bold"),
    axis.title.x = element_text(face = "bold")
  )

This graph shows that linebackers lead in forced fumbles and defensive ends have the least forced fumbles.

6.7 Fumble Recoveries Graph

ggplot(DefS) +
  aes(x = Position, y = Fumble_Recoveries) +
  geom_col(fill = "#212221") +
  labs(
    x = "Position",
    y = "Fumble Recoveries",
    title = "Total Fumble Recoveries by Position for the 2023/2024 NFL Season"
  ) +
  theme_light() +
  theme(
    plot.title = element_text(size = 15L,
                              face = "bold"),
    axis.title.y = element_text(face = "bold"),
    axis.title.x = element_text(face = "bold")
  )

This graph shows that linebackers lead in fumble recoveries and defensive ends have the least fumble recoveries.

6.8 Sacks Graph

ggplot(DefS) +
  aes(x = Position, y = Sacks) +
  geom_col(fill = "#212221") +
  labs(
    x = "Position",
    y = "Sacks",
    title = "Total Sacks by Position for the 2023/2024 NFL Season"
  ) +
  theme_light() +
  theme(
    plot.title = element_text(size = 15L,
                              face = "bold"),
    axis.title.y = element_text(face = "bold"),
    axis.title.x = element_text(face = "bold")
  )

This graph shows that linebackers lead in sacks and corner backs have the least sacks.

6.9 Combined Tackles Graph

ggplot(DefS) +
  aes(x = Position, y = Combined_Tackles) +
  geom_col(fill = "#212221") +
  labs(
    x = "Position",
    y = "Combined Tackles",
    title = "Total Combined Tackles by Position for the 2023/2024 NFL Season"
  ) +
  theme_light() +
  theme(
    plot.title = element_text(size = 15L,
                              face = "bold"),
    axis.title.y = element_text(face = "bold"),
    axis.title.x = element_text(face = "bold")
  )

This graph shows that linebackers lead in combined tackles and defensive ends have the least combined tackles.

6.10 Solo Tackles Graph

ggplot(DefS) +
  aes(x = Position, y = Solo_Tackles) +
  geom_col(fill = "#212221") +
  labs(
    x = "Position",
    y = "Solo Tackles",
    title = "Total Solo Tackles by Position for the 2023/2024 NFL Season"
  ) +
  theme_light() +
  theme(
    plot.title = element_text(size = 15L,
                              face = "bold"),
    axis.title.y = element_text(face = "bold"),
    axis.title.x = element_text(face = "bold")
  )

This graph shows that linebackers lead in solo tackles and defensive ends have the least solo tackles.

6.11 Assisted Tackles Graph

ggplot(DefS) +
  aes(x = Position, y = Assisted_Tackles) +
  geom_col(fill = "#212221") +
  labs(
    x = "Position",
    y = "Assisted Tackles",
    title = "Total Assisted Tackles by Position for the 2023/2024 NFL Season"
  ) +
  theme_light() +
  theme(
    plot.title = element_text(size = 15L,
                              face = "bold"),
    axis.title.y = element_text(face = "bold"),
    axis.title.x = element_text(face = "bold")
  )

This graph shows that linebackers lead in assisted tackles and defensive ends have the least assisted tackles.

6.12 Tackles For Loss Graph

ggplot(DefS) +
  aes(x = Position, y = Tackles_For_Loss) +
  geom_col(fill = "#212221") +
  labs(
    x = "Position",
    y = "Tackles For Loss",
    title = "Total Tackles For Loss by Position for the 2023/2024 NFL Season"
  ) +
  theme_light() +
  theme(
    plot.title = element_text(size = 15L,
                              face = "bold"),
    axis.title.y = element_text(face = "bold"),
    axis.title.x = element_text(face = "bold")
  )

This graph shows that blank lead in blank and blank have the least blank

6.13 QB Hits Graph

ggplot(DefS) +
  aes(x = Position, y = QB_Hits) +
  geom_col(fill = "#212221") +
  labs(
    x = "Position",
    y = "QB Hits",
    title = "Total QB Hits by Position for the 2023/2024 NFL Season"
  ) +
  theme_light() +
  theme(
    plot.title = element_text(size = 15L,
                              face = "bold"),
    axis.title.y = element_text(face = "bold"),
    axis.title.x = element_text(face = "bold")
  )

This graph shows that linebackers lead in quarter back hits and corner backs have the least quarter back hits.

6.14 Safeties Graph

ggplot(DefS) +
  aes(x = Position, y = Safeties) +
  geom_col(fill = "#212221") +
  labs(
    x = "Position",
    y = "Safeties",
    title = "Total Safeties by Position for the 2023/2024 NFL Season"
  ) +
  theme_light() +
  theme(
    plot.title = element_text(size = 15L,
                              face = "bold"),
    axis.title.y = element_text(face = "bold"),
    axis.title.x = element_text(face = "bold")
  )

This graph shows that defensive tackles lead in safeties and safeties have the least safeties.

Overall, linebackers lead in most of the statistics and defensive ends have the least of most statistics. This makes sense as linebackers have the most games played and started and defensive ends have the least.

7 Quarterback Data Set

This data set is from kaggle, and is every Quarterback season from 1970 to 2022. It shows different stats from the season including Passing Yards, Touchdowns, and Interceptions. However, there are many more stats included in the table that I did not feel were important for comparison

Stats <- read_excel("NFL QB Stats.xlsx")

8 Quarterback Data Set Wrangling

For the Quarterbacks table there was quite a bit of wrangling of all sorts done. I began by selecting the top 20 quarterbacks by passing yards per year to factor out human variability, and Quarterback injuries during the season. Then the rest of the wrangling is separated into two categories, Yards and Points.

Starters <- Stats%>%
  group_by(Year)%>%
  arrange(desc(`Pass Yds`))%>%
  filter(row_number()<21)%>%
  arrange(desc(Year))

8.1 Yards Wrangling

Many people discuss greatness of a Quarterbacks by putting most of their emphasis on the yards the Quarterback threw for. The first wrangling I did for the yards was create an average passing yards per year and merged the table onto the previously starters table. I then added the number of games played in a year, because since 1970 the number of games played in a year has changed twice, and that knowledge is necessary for comparisons. After adding the statistics for clarification, the main stat of Average Passing Yards+, a standardized statistic for passing yars per year, is created and added onto the starters table.

Era<- Starters%>%
  group_by(Year)%>%
  filter(Year != 1982)%>%
  summarise(`Year Mean` = mean(`Pass Yds`))

Temp<-Era%>%
  mutate(`Games per Year` = if_else(Year<1982,"14 Games","16 Games"))

Temp2<-replace(Temp$`Games per Year`,Temp$Year>2020,"17 Games")

NewEra<-Era%>%
  mutate(`Games per Year` = Temp2)

MergedYards<-merge(x= Starters, y = Era, by = "Year", all.x = T)%>%
  select(Year,Player, `Pass Yds`, TD, INT, `Year Mean`)

AdjYards<- MergedYards%>%
  summarise(Year = as.character(Year), Player = Player, TD = TD, INT= INT,`APY (Average Passing Yards)` = `Year Mean` ,Yards = `Pass Yds`, `APY+` = round(100*(`Pass Yds`/`Year Mean`),0))%>%
arrange(desc(`APY+`))

8.2 Points Wrangling

The points variable created is a better way to compare Quarterbacks over the years by more stats than just passing yards. For points, each yard is a point, each touchdown is 100 points, and each interception is -50 points. The rest of the steps of the points data wrangling is the exact same process as above, but rather than the main stat of comparison being the passing yards stat, it is for the newly created points stat.

Points<-Starters%>%
  group_by(Year)%>%
  reframe(Year = Year, Player = Player, Yards = `Pass Yds`,TD = TD, INT = INT, Points = ((`Pass Yds`)+(TD*100)-(INT*50)))

EraPoints<- Points%>%
  group_by(Year)%>%
  filter(Year != 1982)%>%
  summarise(`Year Mean` = mean(Points))

NewEraPoints<-EraPoints%>%
  mutate(`Games per Year` = Temp2)

MergedPoints<-merge(x= Points, y = NewEraPoints, by = "Year", all.x = T)%>%
  select(Year,Player, Points, TD, INT, `Year Mean`,`Games per Year`)

AdjPoints<- MergedPoints%>%
  summarise(Year = as.character(Year), Player = Player, TD = TD, INT =INT,`AP (Average Points)` = `Year Mean` 
            ,Points = Points, `AP+` = round(100*(Points/`Year Mean`),0),`Games per Year`= `Games per Year`)%>%
  arrange(desc(`AP+`))

9 Quarterback Data Tables

The created tables are the best era adjusted Quarterbacks of all time. I selected the top 5 based on the created Average Passing Yards+, APY+, and the Average Points+, AP+. The difference in the tables is the main reason why you cannot strictly focus on passing yards like some people in debates do. The top APY+ does not align with the top AP+ players showing the importance of the other stats that Quarterbacks have.

9.1 Average Passing Yards+

AdjYardTable<- AdjYards%>%
  select(Year, Player, TD, INT ,`APY+`)
head(AdjYardTable,5)%>%
  kable(caption = "Best APY+ since 1970",
        align = c("l", rep("c", 10))) %>%
  kable_styling(      
    bootstrap_options = c("striped"),      
    font_size = 16)
Best APY+ since 1970
Year Player TD INT APY+
1973 Roman Gabriel 23 12 172
1984 Dan Marino 48 17 158
1991 Warren Moon 23 21 150
1990 Warren Moon 33 13 149
1971 John Hadl 21 25 147

9.2 Average Points+

AdjPointTable<- AdjPoints%>%
  select(Year, Player, TD, INT ,`AP+`)
head(AdjPointTable,5)%>%
  kable(caption = "Best AP+ since 1970",
        align = c("l", rep("c", 10))) %>%
  kable_styling(      
    bootstrap_options = c("striped"),      
    font_size = 16)
Best AP+ since 1970
Year Player TD INT AP+
1984 Dan Marino 48 17 203
1973 Roman Gabriel 23 12 190
1986 Dan Marino 44 23 189
2007 Tom Brady 50 8 181
2013 Peyton Manning 55 10 176

10 Quarterback Data Visualizations

The created visualizations can show why it is so difficult to compare players between eras, and can show who the most effective players were in the each statistical category.

10.1 Era comparisons

The first visualizations that need to be seen are the year to year comparison of average passing yards and points per year to see why it is so difficult to compare players from different eras. As seen from the tables below the average from 1970 to 2022 has over doubled, and a good season from the 1970s would be a horrible season by today’s standards. This is the reason it is so difficult to compare players that did not play at the same time.

10.1.1 Average yards per season

ggplot(NewEra) +
  aes(x = Year, y = `Year Mean`, colour = `Games per Year`) +
  geom_point(shape = "circle", size = 2L) +
  scale_color_brewer(palette = "Set1", direction = 1) +
  labs(x = "Year", y = "Average Passing Yards") +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 30L,
                              hjust = 0.5),
    axis.title.y = element_text(size = 14L),
    axis.title.x = element_text(size = 14L)
  )

10.1.2 Average points per season

ggplot(NewEraPoints) +
  aes(x = Year, y = `Year Mean`, colour = `Games per Year`) +
  geom_point(shape = "circle", size = 2L) +
  scale_color_brewer(palette = "Set1", direction = 1) +
  labs(x = "Year", y = "Average Passing Yards") +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 30L,
                              hjust = 0.5),
    axis.title.y = element_text(size = 14L),
    axis.title.x = element_text(size = 14L)
  )

10.2 Best Players comparisons

These Data visualizations will take the Data Tables from above, and show the visualization through a bar plot.

10.2.1 Average Passing Yards+

#Best APY+ Barplot
ggplot(head(AdjYardTable, 5)) +
  aes(x = Year, y = `APY+`, fill = Player) +
  geom_col() +
  scale_fill_brewer(palette = "Set1", direction = 1) +
  labs(
    x = "Year",
    y = "AP+",
    title = "Best AP+ since 1970",
    fill = "Name"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 30L),
    axis.title.y = element_text(size = 14L),
    axis.title.x = element_text(size = 14L)
  )

10.2.2 Average Points+

#Best AP+ Barplot
ggplot(head(AdjPointTable, 5)) +
  aes(x = Year, y = `AP+`, fill = Player) +
  geom_col() +
  scale_fill_brewer(palette = "Set1", direction = 1) +
  labs(
    x = "Year",
    y = "AP+",
    title = "Best AP+ since 1970",
    fill = "Name"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 30L),
    axis.title.y = element_text(size = 14L),
    axis.title.x = element_text(size = 14L)
  )